I am Emily Josephs
- interested in: evolutionary genetics, plants, triathlons, cat
October 17, 2024
I am Emily Josephs
I am Emily Josephs
Data into spreadsheet
Wrangle data into R
Visualize data
Build a model
Use algorithm to parameterize model
Interpret results
Data into spreadsheet
Wrangle data into R
Visualize data
Build a model
Use algorithm to parameterize model
Interpret results
Still using R, but focus won’t be on the language itself!
This part of the course may feel like it moves a little faster.
There will be math!
You will be learning a new skill, using a young skill, which is hard!
The material will build on itself.
I am learning along with you!!!
ALSO:
Everything else in the world is happening!!!
take a deep breath!
believe in yourself!!!
remember that your primary goal is present understanding, but a solid secondary goal is future understanding.
do the homework and the self-reflections.
visit me and Sophie in office hours
Emily’s office hours:
noon-1pm Tuesday + Thursday in PLB 266
Sophie’s office hours:
11am-noon Monday + Wednesday in the EEB Hub and on Zoom
Learning goals:
Learn what probability is and the law of total probability.
Write basic simulations in R to build intuition about:
-simple mutually-exclusive events
-complex mutually-exclusive events
-shared non-exclusive events
-conditional probabilities
Why do we care about probability?
Many important biological processes are influenced by chance
We don’t want to tell science stories about coincidences.
Understanding probability helps us understand statistics.
A measure of the likelihood that an event will occur
A long-term frequency
How often we expect an event to happen (‘degree of belief’)
Will I get to Park Place?
Which definition matches the probability of landing on Park Place?
When will I give birth?
Which definition matches the probability of giving birth on a specific day?
Who will win the election?
Which definition matches the probability of an election outcome?
Building an intuition for how the rules of probability work using simulations.
Simulations?
One of the most powerful tools that we’ll have in our statistics learning toolkit are simulations.
Simulations let you generate data that you know should look a certain way, so you can test your intuitions.
Simulations also let you do the same thing over and over and over.
thisClass = 1:35 thisClass
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## [26] 26 27 28 29 30 31 32 33 34 35
sample(thisClass, size=1)
## [1] 31
sample(thisClass, size=1)
## [1] 15
sample(thisClass, size=1)
## [1] 14
If we randomly pick a student, how likely are we to select student #19?
If we randomly pick 10 students, how likely are we to select student #19?
If we randomly pick 2 students, how likely are we to select students #19 and #20?
If we randomly pick a student, how likely are we to select student #19?
mySamples <- replicate(10000, sample(thisClass, size=1)) sum(mySamples==19)/10000
## [1] 0.0304
To think about probability, we start with the set of all potential outcomes (the “sample space”)
For example, when you flip a coin the potential outcomes are heads and tails.
What was the sample space for our class example?
The sample space for this example is that ( A), The ball can fall through the orange bin, ( B) the ball can fall through the green bin, and ( C) the ball can fall through the blue bin
If we do something, one of the things in the sample space will happen.
Therefore, the probabilities of all the outcomes in the sample space will sum to one.
This is the Law of Total Probability
Looking at one short time from the previous example:
What proportion of balls belong to each outcome?
A = 10/15
B = 2/15
C = 3/15
These proportions are an estimate of the true probability based on one sample.
mycols = c("#EDA158","#62CAA7","#98C5EB")
barplot(c(10/15,2/15,3/15), col=mycols, xlab = "probability",
ylab = "sample space")
Outcomes that cannot occur at the same time are mutually exclusive.
For example, a coin could be heads or tails but not both.
But, a coin could be both heads up and a quarter. These would be non-exclusive events
Let’s write a simulation of the ball example with 500 balls.
myN <- 500
pA = 3/6
pB = 1/6
pC = 2/6
mySample = sample(x=c("A","B","C"),
size = myN,
replace=TRUE,
prob = c(pA, pB, pC))
propA = sum(mySample=="A")/myN
propB = sum(mySample=="B")/myN
propC = sum(mySample=="C")/myN
mycols = c("#EDA158","#62CAA7","#98C5EB")
barplot(c(propA, propB, propC), col=mycols, xlab = "probability",
ylab = "proportion", names.arg = c('A','B','C'))
Work with your groups to calculate the probability of A or B happening.
Note that this is called a complex event, since it is the combination of multiple, mutually-exclusive outcomes.
One way to solve the problem:
propAorB = sum(mySample=="A") + sum(mySample=="B") propAorB/myN
## [1] 0.674
OR
propAorB = 1 - propC propAorB
## [1] 0.674
What if we want to simulate many samples to get a sampling distribution of how likely we are to get A or B?
nReps <- 100
sampleFunction <- function(myN){
mySample = sample(x=c("A","B","C"),
size = myN,
replace=TRUE,
prob = c(pA, pB, pC))
propA = sum(mySample=="A")/myN
propB = sum(mySample=="B")/myN
propC = sum(mySample=="C")/myN
return(propA + propB)
}
mySamples <- replicate(nReps, sampleFunction(500))
hist(mySamples, main="", xlab = "proportion A or B")
What if events are not exclusive?
How do we calculate the probability of falling through A and B?
We’ll start by assuming that falling through A does not affect the probability of falling through B (we’ll revisit this later).
If the \(P(A) = 0.5\) and \(P(B) = 0.5\), what is the probability of falling through both A and B?
How can we edit this sample function to return the probability of falling through both A and B if \(P(A) = 0.5\) and \(P(B) = 0.5\)?
sampleFunction <- function(myN){
mySample = sample(x=c("A","B","C"),
size = myN,
replace=TRUE,
prob = c(pA, pB, pC))
propA = sum(mySample=="A")/myN
propB = sum(mySample=="B")/myN
propC = sum(mySample=="C")/myN
return(propA + propB)
}
pA = pB = 0.5
sampleFunction2 <- function(myN){
mySampleA = sample(x=c(1,0),
size = myN, replace=TRUE,
prob = c(pA, 1-pA))
mySampleB = sample(x=c(1,0),
size = myN, replace=TRUE,
prob = c(pB, 1-pB))
myCombined = mySampleA + mySampleB
probBoth = sum(myCombined==2)/myN
return(probBoth)
}
mySamples <- replicate(100,sampleFunction2(100)) hist(mySamples, main="", xlab = "proportion A and B")
Conditional probabilities describe the probability of outcome B given outcome A.
Conditional probabilities are really useful for thinking about the relationships between different events.
\(P(A|B)\) is the probability of A conditional on B.
If \(P(A|B)=0\), A and B are mutually exclusive.
Warning
In R, | means or
This is very unfortunate.
Stay safe!
Work with your teams to modify our sampling function to simulate the following scenario: \(P(A) = 1/3\), \(P(B|A) = 1/2\), \(p(B|not A) = 3/4\)
How often do we find A and B?
Note that this is called a shared event, made up of multiple non-exclusive outcomes.
pA = 1/2 pB_given_A = 4/5 pB_given_notA = 1/5 #write a function to do one trial (one ball) #write a function to get a sample by running that trial 500 times #write a function to generate 100 samples
#write a function to do one trial (one ball)
oneTrial <- function(){
mySampleA = sample(x=c(1,0),
size = 1, replace=TRUE,
prob = c(pA, 1-pA))
if (mySampleA == 1){pB = pB_given_A}
else{pB = pB_given_notA}
mySampleB = sample(x=c(1,0),
size = 1, replace=TRUE,
prob = c(pB, 1-pB))
myCombined = mySampleA + mySampleB
isBoth = sum(myCombined==2)
return(isBoth)
}
#write a function to get a sample by running that trial 500 times
sampleFunction3 <- function(myN){sum(replicate(myN, oneTrial()))/myN}
#write a function to generate 100 samples
mySamples <- replicate(100,sampleFunction3(100))
hist(mySamples, main="", xlab = "Prob of A and B")
Learning goals:
Learn what probability is and the law of total probability.
Write basic simulations in R to build intuition about:
-simple mutually-exclusive events
-complex mutually-exclusive events
-shared non-exclusive events
-conditional probabilities